Skip to content

Conversation

@zhjwpku
Copy link
Collaborator

@zhjwpku zhjwpku commented Jan 17, 2026

No description provided.

Change DataFileSet from std::unordered_set to a custom class that preserves
insertion order, similar to Java's DataFileSet which uses LinkedHashSet.
This is important for row ID assignment in v3 manifests, where row IDs
are assigned based on the order files are written.

The implementation uses both a vector (for insertion order) and an
unordered_set (for O(1) duplicate detection) to maintain the same API
while preserving order.
#include "iceberg/result.h"
#include "iceberg/type_fwd.h"
#include "iceberg/update/snapshot_update.h"
#include "iceberg/util/content_file_util.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider moving DataFileSet out of content_file_util.h to avoid including more headers than needed. We may also need some test cases of it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, do you mind if I do the refactor in a separate PR, along with some of the other TODOs we mentioned in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, feel free to add them as followups.

@wgtmac
Copy link
Member

wgtmac commented Jan 20, 2026

Thanks a lot for working on this! This completes the 0.2.0 milestone.

@wgtmac wgtmac merged commit a457099 into apache:main Jan 20, 2026
10 checks passed
@zhjwpku zhjwpku deleted the add_fast_append branch January 20, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants